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Abstract 

In this paper, we develop a novel semi-supervised learning algorithm called active hybrid deep belief networks (AHD), to 
address the semi-supervised sentiment classification problem with deep learning. First, we construct the previous several 
hidden layers using restricted Boltzmann machines (RBM), which can reduce the dimension and abstract the information of 
the reviews quickly. Second, we construct the following hidden layers using convolutional restricted Boltzmann machines 
(CRBM), which can abstract the information of reviews effectively. Third, the constructed deep architecture is fine-tuned by 
gradient-descent based supervised learning with an exponential loss function. Finally, active learning method is combined 
based on the proposed deep architecture. We did several experiments on five sentiment classification datasets, and show 
that AHD is competitive with previous semi-supervised learning algorithm. Experiments are also conducted to verify the 
effectiveness of our proposed method with different number of labeled reviews and unlabeled reviews respectively. 
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Introduction 

Recently, more and more people write reviews and share 
opinions on the World Wide Web, which present a wealth of 
information on products and services [1]. These reviews will not 
only help other users make better judgements but they are also 
useful resources for manufacturers of products to keep track and 
manage customer opinions [2] . However, there are large amounts 
of reviews for every topic, it is difficult for a user to manually learn 
the opinions of an interesting topic. Sentiment classification, which 
aims to classify a text according to the expressed sentimental 
polarities of opinions such as 'positive' or 'negative', 'thumb up' or 
'thumb down', 'favorable' or 'unfavorable' [3], can facilitate the 
investigation of corresponding products or services. 

In order to learn a good text classifier, a large number of labeled 
reviews are often needed for training [4]. However, labeling 
reviews is often difficult, expensive or time consuming [5] . On the 
other hand, it is much easier to obtain a large number of unlabeled 
reviews, such as the growing availability and popularity of online 
review sites and personal blogs [6]. In recent years, a new 
approach called semi-supervised learning, which uses large 
amount of unlabeled data together with labeled data to build 
better learners [7], has been developed in the machine learning 
community. 

There are several works have been done in semi-supervised 
learning for sentiment classification, and have get competitive 
performance [3,8-10]. However, most of the existing semi- 
supervised learning methods are still far from satisfactory. As 
shown by several researchers [11,12], deep architecture, which 



composed of multiple levels of non-linear operations, is expected 
to perform well in semi-supervised learning because of its 
capability of modeling hard artificial intelligent tasks. Deep belief 
networks (DBN) is a representative deep learning algorithm 
achieving notable success for text classification, which is a directed 
belief nets with many hidden layers constructed by restricted 
Boltzmann machines (RBM), and refined by a gradient-descent 
based supervised learning [12]. Ranzato and Szummer [13] 
propose an algorithm to learn text document representations 
based on semi-supervised auto-encoders that are combined to 
form a deep network. Zhou et al. [10] propose a novel semi- 
supervised learning algorithm to address the semi-supervised 
sentiment classification problem with active learning. Socher et al. 
[14] introduce a novel machine learning framework based on 
recursive autoencoders for sentence-level prediction of sentiment 
label distributions. Socher et al. [15] introduce the recursive neural 
tensor network for semantic compositionality over a sentiment 
treebank. The key issue of traditional DBN is the efficiency of 
RBM training. Convolutional neural networks (CNN), which are 
specifically designed to deal with the variability of two dimensional 
shapes, have had great success in machine learning tasks and 
represent one of the early successes of deep learning [16]. 
Desjardins and Bengio [17] adapt RBM to operate in a 
convolutional manner, and show that the convolutional RBM 
(CRBM) are more efficient than standard RBM. 

CRBM has been applied successfully to a wide range of visual 
and audio recognition tasks [18,19]. Though the success of CRBM 
in addressing two dimensional issues, there is still no published 
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Figure 1. Architecture of HDBN. 
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research on the using of CRBM in textual information processing. 
In this paper, we propose a novel semi-supervised learning 
algorithm called active hybrid deep belief networks (AHD), to 
address the semi-supervised sentiment classification problem with 
deep learning. AHD is an active learning method based on deep 
architecture, which the bottom layers are constructed by RBM, 
and the upper layers are constructed by CRBM, then the whole 
constructed deep architecture is fine tuned by a gradient-descent 
based supervised learning based on an exponential loss function. 




Figure 2. Architecture of CRBM. 

doi:1 0.1 371/journal.pone.01 071 22.g002 
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where R is the number of training reviews, T is the number of test 
reviews, D is the number of feature words in the dataset. Every 
column of X corresponds to a sample x, which is a representation 
of a review. A sample that has all features is viewed as a vector in 
IR" 0 , where the /'* coordinate corresponds to the i th feature. 

The L labeled reviews are chosen randomly from R training 
reviews, or chosen actively by active learning, which can be seen 
as: 



X L = X R (S),S = [s u ...,s L ], \<Si<R 



(2) 



where S is the index of selected training reviews to be labeled 
manually. 

The L labels correspond to L labeled training reviews is denoted 

as: 



Hybrid Deep Belief Networks Method 

Problem formulation 

The sentiment classification dataset composed of many review 
documents, each review document composed of a bag of words. 
To classify these review documents using corpus-based approach- 
es, we need to preprocess them in advance. The preprocess 
method for these reviews is similar with [9,10]. We tokenize and 
downcase each review and represent it as a vector of unigrams, 
using binary weight equal to 1 for terms present in a vector. 
Moreover, the punctuations, numbers, and words of length one 
are removed from the vector. Finally, we combine all the words in 
the dataset, sort the vocabulary by document frequency and 
remove the top 1.5%, because many of these high document 
frequency words are stopwords or domain specific general-purpose 
words. 

After preprocess, each review can be represented as a vector of 
binary weight x'. If the / word of the vocabulary is in the 
review, xj=l; otherwise, xj = 0. Then the dataset can be 
represented as a matrix: 



r i 2 



A. A, 



yhA' •••-y\ 



y l c> yh 



< yc 



(3) 



where C is the number of classes. Every column of Y is a vector in 
IR C , where the j' h coordinate corresponds to the j' h class. 



1 if x'e f h class 
y '~^ -1 if x'tf> class 



(4) 



For example, if a review x' is positive, y' = [1, — 1]'; otherwise, 

y' = [-i, l]'. 

We intend to seek the mapping function X— > Y using the L labeled 
data and all unlabeled data. After training, we can determine y using 
the mapping function when a new sample x comes. 
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Architecture of HDBN 

In this part, we propose a novel semi-supervised learning 
method HDBN to address the sentiment classification problem. 
The sentiment datasets have high dimension (about 10,000), and 
computation complexity of convolutional calculation is relatively 
high, so we use RBM to reduce the dimension of review with 
normal calculation firstly. Fig. 1 shows the deep architecture of 
HDBN, a fully interconnected directed belief nets with one input 
layer h°, N hidden layers h ,h ,...,h N , and one label layer at the 
top. The input layer h° has D units, equal to the number of 
features of sample review x. The hidden layer has M layers 
constructed by RBM and N — M layers constructed by CRBM. 
The label layer has C units, equal to the number of classes of label 
vector y. The numbers of hidden layers and the number of units 
for hidden layers, currently, are pre-defined according to the 
experience or intuition. The seeking of the mapping function 
X— >Y, here, is transformed to the problem of finding the 
parameter space W= {w^w 2 , . . . ,w w } for the deep architecture. 

The training of the HDBN can be divided into two stages: 

1. HDBN is constructed by greedy layer-wise unsupervised 
learning using RBMs and CRBMs as building blocks. L labeled 
data and all unlabeled data are utilized to find the parameter space 
W with N layers. 

2. HDBN is trained according to the exponential loss function 
using gradient descent based supervised learning. The parameter 
space W is refined using L labeled data. 

Unsupervised learning 

As show in Fig. 1 , we construct HDBN layer by layer using 
RBMs and CRBMs, the details of RBM can be seen in [12]. 
CRBM is introduced below. 

The architecture of CRBM can be seen in Fig. 2, which is 
similar to RBM, a two-layer recurrent neural network in which 
stochastic binary input groups are connected to stochastic binary 
output groups using symmetrically weighted connections. The top 
layer represents a vector of stochastic binary hidden feature h /l and 
the bottom layer represents a vector of binary visible data h* _1 , 
k = M+ l,...,N. The k th layer consists of Gk groups, where each 
group consists of D/ c units, resulting in G/ f x hidden units. The 
layer h M is consist of 1 group and Dm units. Y/ k is the symmetric 
interaction term connecting corresponding groups between data 
h' f_1 and feature h'\ However, comparing with RBM, the weights 
of CRBM between the hidden and visible groups are shared 
among all locations [18], and the calculation is operated in a 
convolutional manner [17]. 

We define the energy of the state (h k -\h k ) as: 



Table 1. Algorithm of HDBN. 



Input: 

data X, Y L 

number of training data R; number of test data T; 
number of layers N; number of epochs Q; 
number of units in every hidden layer D\...D N ; 
number of groups in every convolutional hidden layer Gm---Gn', 
hidden layer h',...,h M ; 
convolutional hidden layer h M+ 1 , . . . ,h A ' " 1 ; 
parameter space W = {w 1 , . . . ,w N }; 
biases b, c; momentum .9 and learning rate 
Output: 

deep architecture with parameter space W 

1. Greedy layer-wise unsupervised learning 
for k=l; k<N-\; k+ + do 

for q = 1; q< Q; q+ + do 
for ) = l;r<R+ T;r+ + do 
Calculate the non-linear positive and negative phase: 
if k < M then 

Normal calculation, 
else 

Convolutional calculation according to Eq. 6 and Eq. 7. 
end if 

Update the weights and biases: 
„A = ftv* + ,(<**; 1 A* r > Pi - (Jt, 'A*,),,) 
end for 
end for 
end for 

2. Supervised learning based on gradient descent 

L C 

argmin ^ exp ( - h N (^y'^ 
doi:10.1371/journal.pone.0107122.t001 

The probability of turning on unit v in group / is a logistic 
function of the states of h* _1 and w k ,: 




E(h k -\h k ;e) = 

G*-l <% C-k-\ <?i D k (5) 

- E Eft^-'K- E *f _I E Af-'-E^E** 

.1 = 1 (=1 S=l 11=1 1=1 11=1 

where 9 = (w,b,c) are the model parameters: w k t is a filter between 
unit s in the layer h* _1 and unit t in the layer h k , k = M+ l,...,N. 
The dimension of the filter w k t is equal to D^- \ — + 1 . b k s ~ x is 
the s th bias of layer h /c_1 and c k is the t' h bias of layer h*. A tilde 
above an array (w) denote flipping the array, * denote valid 
convolution, and • denote element-wise product followed by 
summation, i.e., AmB=trA T B [18]. 

Gibbs sampler can be performed based on the following 
conditional distribution. 



The probability of turning on unit u in group s is a logistic 
function of the states of h k and tv^: 

p (*£r 1 = 1 i hA ) = si s m (b'r 1 + < E < * *f)„) ( y ) 

where the logistic function is: 

sigm( )? ) = l/(l+ e -") (8) 

A star * denotes full convolution. 
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Table 2. Algorithm of AHD. 



Input: 

data X, (X L ,Y L ) (one positive and one negative) 
number of training data R 
number of iterations / 

number of active choosing data for every iteration U 
parameter space W = {w 1 , . . . ,w^} 
Output: 

deep architecture with parameter space W 
for ;= 1; i<I; i+ + do 

Train HDBN with labeled dataset X £ and all unlabeled data in X. 

Choose U reviews which near the separating line from train dataset X^ through Eq. 17. 
Add U reviews into the labeled data set X L . 
end for 

Train HDBN with labeled dataset X L and all unlabeled data in X. 



doi:10.1371/journal.pone.0107122.t002 



The convolution computation can extract the information of 
text effectively based on deep architecture, although it needs more 
computation time. 

Supervised learning 

In HDBN, we construct the deep architecture using all labeled 
reviews with unlabeled reviews by inputting them one by one from 
layer h°. The deep architecture is constructed layer by layer from 
bottom to top, and each time, the parameter space is trained by 
the calculated data in the k — 1 th layer. 

According to the W k calculated by RBM and CRBM, the layer 
b !c ,k = 1, . . . ,M can be computed as following when a sample x 
inputs from layer h°: 



D k-\ 

^(x) = sigm( c* + E "'sM 

s= 1 



(x) ,/ = !,... ,D k (9) 



When k = M+ \,...,N— I, the layer h* can be represented as: 
^(x) = sigrXf+ Y^w"; t *h k ;-\x)\t=\,...,G k (10) 



The parameter space W is initialized randomly, just as 
backpropagation algorithm. 



argmin f(A Ar (X i ),Y^ 



where 



f(^(X^) = ££T /,jv>; 



and the loss function is defined as 



T(r) = exp ( — r) 



(12) 



(13) 



(14) 



We use gradient-descent through the whole HDBN to refine the 
weight space. In the supervised learning stage, the stochastic 
activities are replaced by deterministic, real valued probabilities. 

Classification using HDBN 

The training procedure of HDBN is given in Table 1 . For the 
training of HDBN architecture, the parameters are random 
initialized with normal distribution. All the reviews in the dataset 
are used to train the HDBN with unsupervised learning. After 
training, we can determine the label of the new data through: 



argmax/j' v (x) 



(15) 



/ i f(x) = cf+ N '"WV), t=l,.. .,D N (11) 

.5=1 

After greedy layer-wise unsupervised learning, h' v (x) is the 
representation of x. Then we use L labeled reviews to refine the 
parameter space W for better discriminative ability. This task can 
be formulated as an optimization problem: 



Active Hybrid Deep Belief Networks Method 

AHD description 

Given an unlabeled pool X* and an initial labeled data set X^ 
(one positive, one negative), the AHD architecture h Ar (x) will 
decide which instance in X* to query next. Then the parameters 
of h' v (x) are adjusted after new reviews are labeled and inserted 
into the labeled data set X^. We choose the reviews that are near 
the separating hyperplane as the labeled training data. 
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Table 3. HDBN structure used in experiment. 





Dataset 


Structure 


MOV 


100-100-4-2 


KIT 


50-50-3-2 


ELE 


50-50-3-2 


BOO 


50-50-5-2 


DVD 


50-50-5-2 
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When HDBN is trained by L labeled data and all unlabeled 
data, the parameters of deep architecture are adjusted, h^fx) is the 
representation of x. Given an unlabeled pool X*, the next 
unlabeled instance to be queried are chosen according to the 
location ofh^X*). For review document, there are only 2 classes 
(positive or negative), so the dimension of h jV (x) is 2, the classes 
separation line is hf ■■ 
and separation line is: 



hj ■ The distance between a point h' v (x') 



(16) 



by: 



The selected training reviews to be labeled manually are given 



:{/ : d(x')= min(d(X R ))} 



(17) 



Classification using AHD 

The training procedure of AHD is given in Table 2. The 
training set can be seen as an unlabeled pool. We randomly 
select one positive and one negative reviews in the pool to input as 
the initial labeled dataset \ L that are used for supervised learning. 
The iteration times / and the number of active choosing data U 
for each iteration can be set manually based on the number of 
labeled reviews in the experiment. 

For each iteration, the HDBN architecture is trained by all the 
unlabeled reviews and labeled reviews in existence with unsuper- 
vised learning and supervised learning firstly. Then U reviews are 
chosen from the unlabeled pool based on the distance of these 
review mapping results from the separating line. At last, these U 
reviews are labeled manually and added to the labeled dataset X L . 
For the next iteration, the HDBN architecture can be re-trained 
by all reviews with unsupervised learning and all labeled reviews 



with the new increased labeled dataset X . At last, HDBN 
architecture is retrained by all the reviews with unsupervised 
learning and existing labeled reviews with supervised learning. 

After active training, we can use the Eq. 15 to determine the 
label of the new data. The purpose of active learning is choose 
more useful label data to train the deep architecture, which can 
use fewer label data to train better classifier. 

Experiments 

Experimental setup 

We evaluate the performance of the proposed HDBN and AHD 
method using five sentiment classification datasets. The first 
dataset is MOV [20], which is a classical movie review dataset. 
The other four datasets contain products reviews come from the 
multi-domain sentiment classification corpus, including books 
(BOO), DVDs (DVD), electronics (ELE), and kitchen appliances 
(KIT) [21]. Each dataset contains 1,000 positive and 1,000 
negative reviews. 

The experimental setup is same as [9] and [10]. We divide the 
2,000 reviews into ten equal-sized folds randomly, maintaining 
balanced class distributions in each fold. Half of the reviews in 
each fold are random selected as training data and the remaining 
reviews are used for test. Only the reviews in the training data set 
are used for the selection of labeled reviews by active learning. All 
the algorithms are tested with cross-validation. 

We compare the classification performance of HDBN with four 
representative semi-supervised learning methods, i.e., semi-super- 
vised spectral learning (Spectral) [22], transductive SVM (TSVM) 
[23], deep belief networks (DBN) [12], and personal/impersonal 
views (PFV) [3]. Spectral learning, TSVM methods are two 
baseline methods for sentiment classification. DBN [12] is the 
classical deep learning method proposed recently. PIV [3] is a new 
sentiment classification method proposed recently. 

We also compare the classification performance of AHD with 
three representative active semi-supervised learning methods, i.e., 
active learning (Active) [24], mine the easy classify the hard 



Table 4. Test accuracy with 100 labeled reviews for semi-supervised learning. 




Type 


MOV 


KIT 


ELE 


BOO 


DVD 


Spectral 


67.3 


63.7 


57.7 


55.8 


56.2 


TSVM 


68.7 


65.5 


62.9 


58.7 


57.3 


DBN 


71.3 


72.6 


73.6 


64.3 


66.7 


PIV 




78.6 


70.0 


60.1 


49.5 


HDBN 


72.2 


74.8 


73.8 


66.0 


70.3 
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Table 5. Test accuracy with 100 labeled reviews for active semi-supervised learning. 





Type 


MOV 


KIT 


ELE 


BOO 


DVD 


Active 


68.9 


68.1 


63.3 


58.6 


58.0 


MECH 


76.2 


74.1 


70.6 


62.1 


62.7 


ADN 


76.3 


77.5 


76.8 


69.0 


71.6 


AFD 


75 


77 


76.8 


70.1 


73.7 
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(MECH) [9], and active deep networks (ADN) [10]. Active 
learning [24] is a baseline active learning method for sentiment 
classification. MECH [9] and ADN [10] are two new active 
learning method for sentiment classification proposed recently. 

Performance of HDBN 

The HDBN architecture used in all our experiments have 2 
normal hidden layer and 1 convolutional hidden layer, every 
hidden layer has different number of units for different sentiment 
datasets. The deep structure used in our experiments for different 
datasets can be seen in Table 3. For example, the HDBN structure 
used in MOV dataset experiment is 100-100-4-2, which represents 
the number of units in 2 normal hidden layers are 100, 100 
respectively, and in output layer is 2, the number of groups in 1 
convolutional hidden layer is 4. The number of unit in input layer 
is the same as the dimensions of each datasets. For greedy layer- 
wise unsupervised learning, we train the weights of each layer 
independendy with the fixed number of epochs equal to 30 and 
the learning rate is set to 0.1. The initial momentum is 0.5 and 
after 5 epochs, the momentum is set to 0.9. For supervised 
learning, we run 30 epochs, three times of linear searches are 
performed in each epoch. 

The test accuracies in cross validation for five datasets and five 
methods with semi-supervised learning are shown in Table 4. The 
results of previous two methods are reported by [9] . The results of 



DBN method are reported by [10]. Li et al. [3] reported the results 
of PIV method. The result of PrV on MOV dataset is empty, 
because [3] did not report it. HDBN is the proposed method. 

Through Table 4, we can see that HDBN gets most of the best 
results except on KIT dataset, which is just slight worse than PIV 
method. However, the preprocess of PIV method is much more 
complicated than HDBN, and the PIV results on other datasets 
are much worse than HDBN method. HDBN method is adjusted 
by DBN, all the experiment results on five datasets for HDBN are 
better than DBN. This could be contributed by the convolutional 
computation in HDBN structure, and proves the effectiveness of 
our proposed method. 

Performance of AHD 

To evaluate the performance of AHD, we compare its results 
with several previous active learning methods for sentiment 
classification. The architectures used in this experiments can be 
seen in Table 3. We perform active learning for 5 iterations. In 
each iteration, we select and label 20 of the most uncertain 
reviews, and then retrain the deep architecture on all of the 
unlabeled reviews and labeled reviews annotated so far. After 5 
iterations, 100 labeled reviews are used for training. 

The test accuracies in cross validation for five datasets and four 
methods with active semi-supervised learning are shown in 
Table 5. The results of previous two methods are reported by 
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Figure 3. Test accuracy of HDBN with different number of unlabeled reviews on five datasets. 
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Figure 4. Test accuracy of ADN and AHD with different number of labeled reviews on five datasets. 

doi:1 0.1 371 /journal.pone.01 071 22.g004 



[9]. The results of ADN method are reported by [10]. AHD is the 
proposed active learning method in this paper. Through Table 5, 
we can see that the results of AHD is better than Active and 
MECH methods, and competitive with ADN method. Because 
ADN and AHD methods are both deep learning method, these 
results prove that deep architecture is good for sentiment 
classification. 

Performance with variance of unlabeled data 

To verify the contribution of unlabeled reviews for our proposed 
method, we did several experiments with fewer unlabeled reviews 
and 100 labeled reviews. We use HDBN method in this part, 
considering AHD method choose the reviews need to label from 
an unlabeled pool, it is unfair to compare the performance of 
AHD when the size of unlabeled pool is different. 

The test accuracies of HDBN with different number of 
unlabeled reviews and 100 labeled reviews on five datasets are 
shown in Fig. 3. The architectures for HDBN used in this 
experiment can be seen in Table 3. We can see that the 
performance of HDBN is much worse when just using 400 
unlabeled reviews. However, when using more than 1200 
unlabeled reviews, the performance of HDBN is improved 
obviously. For most of review datasets, the accuracy of HDBN 
with 1200 unlabeled reviews is close to the accuracy with 1600 and 
2000 unlabeled reviews. This proves that HDBN can get 
competitive performance with just few labeled reviews and 
appropriate number of unlabeled reviews. Considering the much 
time needed for training with more unlabeled reviews and less 
accuracy improved for HDBN method, we suggest using 
appropriate number of unlabeled reviews in real application. 

Performance with variance of labeled data 

To verify the contribution of labeled reviews for our proposed 
method, we did several experiments with different number of 



labeled reviews on five datasets. To compare the active learning 
performance with ADN [10], we use AHD method in this 
experiment, all the experimental setting are same as ADN. The 
architectures for AHD used in this experiment can be seen in 
Table 3. 

The test accuracies of ADN and AHD with different number of 
labeled reviews on five datasets are shown in Fig. 4. We can see 
that the performance of AHD is better than ADN for most of the 
experimental setting, although they are both based on the DBN 
method. This proves that the convolutional computation has 
better performance than the normal computation in the deep 
architecture for sentiment classification. We can also see that both 
ADN and AHD can get high accuracy even with just 20 labeled 
reviews for training. This proves the effect of deep learning 
method for semi-supervised learning with very few labeled reviews. 

Conclusions 

In this paper, we propose a novel semi-supervised learning 
method, AHD, to address the sentiment classification problem 
with a small number of labeled reviews. AHD seamlessly 
incorporate convolutional computation into the DBN architecture, 
and use CRBM to abstract the review information effectively. One 
promising property of AHD is that it can effectively use the 
distribution of large amount of unlabeled data, together with few 
label information in a unified framework. In particular, AHD can 
greatly reduce the dimension of reviews through RBM and 
abstract the information of reviews through the cooperate of RBM 
and CRBM. Then an exponential loss function is used to refine 
the constructed deep architecture with few label information. 
Moreover, it can choose the review to be labeled actively, improve 
the performance of deep architecture effectively. 

Experiments conducted on five sentiment datasets demonstrate 
that AHD outperforms most of previous methods and is 
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competitive with DBN based method, which demonstrates the 
performance of deep architecture for sentiment classification. 
Experiments are also conducted to verify the effectiveness of AHD 
method with different number of labeled reviews, the results show 
that AHD can reach very competitive performance with few 
labeled reviews and large amount of unlabeled reviews. It provides 
soundness support for the effectiveness of AHD for real 
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